Concept Drift Detection via Equal Intensity k-Means Space Partitioning
نویسندگان
چکیده
Data stream poses additional challenges to statistical classification tasks because distributions of the training and target samples may differ as time passes. Such distribution change in streaming data is called concept drift. Numerous histogram-based detection methods have been proposed detect Most histograms are developed on grid-based or tree-based space partitioning algorithms which makes partitions arbitrary, unexplainable, cause drift blind-spots. There a need improve accuracy for with unsupervised setting. To address this problem, we propose cluster-based histogram, equal intensity k-means (EI-kMeans). In addition, heuristic method sensitivity introduced. The fundamental idea improving minimize risk creating offset regions. Pearson's chi-square test used hypothesis so that statistics remain independent sample distribution. number bins their shapes, strongly influence ability drift, determined dynamically from based an asymptotic constraint test. Accordingly, three implement detection, including greedy centroids initialization algorithm, cluster amplify-shrink algorithm. For adaptation, recommend retraining learner if detected. results experiments synthetic real-world datasets demonstrate advantages EI-kMeans show its efficacy detecting
منابع مشابه
Concept drift detection via competence models
Detecting changes of concepts, such as a change of customer preference for telecom services, is very important in terms of prediction and decision applications in dynamic environments. In particular, for case-based reasoning systems, it is important to know when and how concept drift can effectively assist decision makers to perform smarter maintenance operations at an appropriate time. This pa...
متن کاملConcept drift detection in business process logs using deep learning
Process mining provides a bridge between process modeling and analysis on the one hand and data mining on the other hand. Process mining aims at discovering, monitoring, and improving real processes by extracting knowledge from event logs. However, as most business processes change over time (e.g. the effects of new legislation, seasonal effects and etc.), traditional process mining techniques ...
متن کاملAdaptive Concept Drift Detection
Concept drift is an important problem in the context of machine learning and data mining. It can be described as a change in the fundamental concepts underlying the data, or, in its most basic form, as a significant change in the distribution of the data. From a learning theoretic point of view, one can say that concept drift is a violation of the i.i.d. assumption, which states that each examp...
متن کاملExploring Concept Representations for Concept Drift Detection
We present an approach to estimating concept drift in online news. Our method is to construct temporal concept vectors from topicannotated news articles, and to correlate the distance between the temporal concept vectors with edits to the Wikipedia entries of the concepts. We find improvements in the correlation when we split the news articles based on the amount of articles mentioning a concep...
متن کاملConcept Drift Detection Through Resampling
Detecting changes in data-streams is an important part of enhancing learning quality in dynamic environments. We devise a procedure for detecting concept drifts in data-streams that relies on analyzing the empirical loss of learning algorithms. Our method is based on obtaining statistics from the loss distribution by reusing the data multiple times via resampling. We present theoretical guarant...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: IEEE transactions on cybernetics
سال: 2021
ISSN: ['2168-2275', '2168-2267']
DOI: https://doi.org/10.1109/tcyb.2020.2983962